UG/Abi: a highly diverse family of prokaryotic reverse transcriptases associated with defense functions 您所在的位置:网站首页 aleex gao UG/Abi: a highly diverse family of prokaryotic reverse transcriptases associated with defense functions

UG/Abi: a highly diverse family of prokaryotic reverse transcriptases associated with defense functions

2023-05-03 12:29| 来源: 网络整理| 查看: 265

Abstract

Reverse transcriptases (RTs) are enzymes capable of synthesizing DNA using RNA as a template. Within the last few years, a burst of research has led to the discovery of novel prokaryotic RTs with diverse antiviral properties, such as DRTs (Defense-associated RTs), which belong to the so-called group of unknown RTs (UG) and are closely related to the Abortive Infection system (Abi) RTs. In this work, we performed a systematic analysis of UG and Abi RTs, increasing the number of UG/Abi members up to 42 highly diverse groups, most of which are predicted to be functionally associated with other gene(s) or domain(s). Based on this information, we classified these systems into three major classes. In addition, we reveal that most of these groups are associated with defense functions and/or mobile genetic elements, and demonstrate the antiphage role of four novel groups. Besides, we highlight the presence of one of these systems in novel families of human gut viruses infecting members of the Bacteroidetes and Firmicutes phyla. This work lays the foundation for a comprehensive and unified understanding of these highly diverse RTs with enormous biotechnological potential.

INTRODUCTION

Reverse transcriptases (RTs, also known as RNA-directed DNA Polymerases) are enzymes present in all three domains of life whose main function is to polymerize DNA strands using RNA as a template. Although they were first discovered by Temin & Baltimore in 1970 (1,2), prokaryotic RTs were not observed until 1989 when they were found to be the main component of retrons (3). Later research revealed that most reverse transcriptases (80%) can be phylogenetically clustered into three major lineages: group II introns, diversity-generating retroelements (DGRs), and retrons, which are the best known due to their ecological implications and biotechnological applications. Other minor clades of RTs include abortive infection (Abi) RTs, CRISPR-Cas-associated RTs, Group II-like (G2L), the unknown groups (UG) and rvt elements (4–6).

Comprehensive and systematic analysis of prokaryotic RTs (7), identified the association of RTs with CRISPR-Cas systems and 5 novel gene families (D, E, F1, F2, and G, now known as UG9, UG6, UG1, UG5 and UG3 + UG8, respectively). Further research revealed the existence of other uncharacterized RTs from distinct clades (UG1-UG16 and Group II-like, including those described by Kojima & Kanehisa) (6,8,9), and a more recent work disclosed 11 additional UG RT groups (UG17-UG28), pointing out that UG and Abi RTs may form a novel major lineage branching off from a common node (5). Although it was initially thought that UG and Abi RTs were not very common, it is known that they represent at least 11% of all prokaryotic RTs, showing an enormous diversity and holding great promise for the development of new biotechnological tools (4).

Abi systems can function as prokaryotic defense mechanisms against certain phages (10,11). They are generally constituted by a sensing module that recognizes a phage-specific signal and an effector module that generates a response, either by blocking the viral infection cycle, halting host metabolism, or causing cell death (12). Although there are >20 different Abi systems, only a few have been well characterized. Among them, AbiA, AbiK and AbiP2 share an N-terminal RT domain (13–17), with AbiA harboring an additional C-terminal HEPN domain (higher eukaryotes and prokaryotes nucleotide-binding domain) with predicted RNAse activity (18). Both AbiA and AbiK are commonly found in plasmids and have been shown to protect Lactococcus spp. from diverse phage infections (13), whereas AbiP2 is commonly found in a hypervariable region of P2 prophages in E. coli and confers resistance against T5 phage (17). It has been hypothesized that AbiA and AbiK could have a similar mechanism of action, as both of them confer protection against the same phages either by blocking DNA replication or targeting functionally-related proteins (11). Furthermore, phages escaping AbiK and AbiA interference have been shown to harbor point mutations in single-strand annealing proteins (SSAPs) involved in DNA replication (19). AbiK is the best characterized, and it has been hypothesized to have protein-primed untemplated DNA-polymerase activity (14). The residues responsible for this activity are thought to be located at the C-terminal region (14) that along with the RT domain, is essential for its biological role (13,20).

Although Abi and UG RTs are phylogenetically related, they were thought to be functionally unrelated as they bear distantly related RT domains. Also, previous analyses pointed out the high divergence at the sequence level between AbiK, AbiA and AbiP2 (6). That notwithstanding, recent research that employed a systematic methodology to search for novel antiphage systems (21) highlighted that some members of UG RTs (UG1, UG2, UG3-UG8, UG15, UG16, named DRTs type 1–5 respectively for Defense-associated RTs) act as defense mechanisms against bacterial viruses. This suggests a functional link between UG and Abi RTs and supports the idea that different families of RTs may be implicated in immunity against bacteriophages (4).

Even though some UG and Abi RTs members have been reported to have antiphage functions and associated domain(s) required for this function, some others remain poorly characterized due to insufficient information on their genomic context, associated genes, or biological roles. Considering the great diversity of these RTs, their possible common origin, and the recently disclosed role of retrons and DRTs (21–24), we hypothesize that the lineage composed of UG and Abi RTs (UG/Abi) may constitute a novel family of defense-related RTs with high divergence and a plethora of associated genes. In this work, we performed a systematic analysis of UG/Abi RTs and their neighborhood in search of associated genes and defense hallmarks. As a result, we expanded the number and diversity of UG/Abi RTs with novel groups, of which most are associated with other protein domain(s) and located within defense islands/hotspots. Based on this information, the UG/Abi RTs could be classified into three major classes, a first class of RTs fused to HEAT-like repeats, a second class of highly diverse RTs not fused to any known domain, and a third class commonly associated with C-N hydrolase (carbon-nitrogen hydrolase, also known as nitrilase) or phosphohydrolase domains. Besides, we demonstrate the antiphage activity of three Class 1 members and an additional Class 2 member. Moreover, we reveal that UG27, a Class 2 UG/Abi RT, is commonly encoded in several groups of predicted human gut viruses infecting members of the Bacteroidetes and Firmicutes phyla, which encode a putative non-coding RNA (ncRNA) with a common secondary structure. Finally, different UG groups (UG2, UG3 + UG8 and UG28) that have been described to possess antiphage properties (DRT type 2, DRT type 3 and DRT type 9 respectively) (21) also encode ncRNAs with conserved secondary structures, thought to be essential for the functioning of the systems. Altogether, these findings reveal that the UG/Abi RTs family is a highly diverse and widespread lineage of prokaryotic reverse transcriptases associated with defense functions that would play a very important role in virus-host conflicts.

MATERIALS AND METHODS Construction of a comprehensive dataset of representative UG/Abi RTs dataset To increase the number of UG/Abi RT sequences, the most up-to-date phylogenetic tree of prokaryotic RTs based on an alignment of the RT domain of 9141 RTs (5) was used as a reference. Custom HMM profiles for every phylogenetic group (group II introns, retrons, DGRs, CRISPR-associated RTs, G2L and Abi/UG RTs) were built using hmmbuild from the suite HMMER 3.3 (25). Then, the NR database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/) was searched (February 2021) with all profiles using hmmsearch (E-value  complete genomes > scaffolds/contigs). If not found in any complete genome, sequences in contigs with the greater neighborhood information (i.e. large contigs where the RT is not located close to their ends) were prioritized by choosing those with the highest W value, a parameter that illustrates the length of the contig weighted by how centered the ORF is in the contig; described below.$$\begin{equation*}L\,{\rm{\;}} = {\rm{\;}}length\,of\,the\,contig\end{equation*}$$$$\begin{equation*}p\;\, = \frac{{absolute\,|end|of\,|CDS - absolute|start|of|CDS}}{2}\end{equation*}$$$$\begin{equation*}W\; = {\rm{\;}}L - \left| {p - \frac{L}{2}} \right|\end{equation*}$$ Phylogenetic and network analysis of UG/Abi RTs

The cd-hit-2d tool (27) was used to label sequences highly similar (>95% AAI) to the reference entries. After this, a multiple sequence alignment (MSA) of RT1-7 motifs was performed using the MAFFT software (28) with default parameters, and a phylogenetic tree was built using FastTree (29) with the WAG evolutionary model, and discrete gamma model with 20 rate categories. A phylogenetic tree was also constructed with IQ-TREE v1.6.12, with 1000 ultra-fast bootstraps (UFBoot) and SH-like approximate likelihood ratio test (SH-aLRT) with 1000 replicates (-bb 1000 -alrt 1000 options) (30), using the LG + R10 model identified as the best model by Modelfinder (31). To compare full-length sequences, a sequence-similarity network (SSN) of these sequences was also built using the EFI-EST resource (32) and visualized using Cytoscape (33) with the force-directed layout using the BLAST score as a weight.

Retrieval and clustering of neighbor proteins

Coding sequences (CDS) located within ±10 kb of the start and the end of our query proteins were retrieved using the feature table resource from the NCBI Entrez API (26). Due to the frequent misannotation of ORFs, nucleotide sequences of the intergenic regions were also retrieved, and ORFs were predicted using the Prodigal tool with -c -m -n -p meta parameters (34). Neighbor CDS and ORFs predicted in the intergenic sequences were joined and clustered using MMseqs2 (35) with a profile-based deep clustering method previously described (36) using 10 iterations, which rendered 3240 neighbor clusters.

Prediction and annotation of functionally associated genes

Genes functionally associated with UG/Abi RTs were predicted using a methodology previously described (36). Briefly, a presence/absence matrix of neighbor clusters surrounding UG/Abi RT entries was analyzed in search of non-random patterns of association. Based on the distribution of clusters across the tree and the average amino acid identity (AAI) of the co-located RTs, 193 out of 3240 protein clusters with more than 5 members were selected as potentially linked. After this, MSAs and HMM profiles were built using MAFFT (28) with default parameters and hmmbuild (25), respectively. Domain annotation of protein clusters was done using HHsearch against the hh-formatted PFAM (37), CDD (38), COG (39), ECOD (40) and pdb30 databases jointly distributed with HH-suite (41) (Supplementary Table S2). In addition, we also performed comparisons against profiles built using the eggNOG (Bacteria, Archaea and Viruses) (42), pVOG (43) and mMGE (44) databases.

Group adscription and refinement of UG/Abi RTs

Group adscription of sequences was manually assigned based on the phylogeny, the sequence-similarity network, the presence of labeled reference sequences, and the presence/absence matrix. Individual sequences that were difficult to classify, with low support, or with little information about the neighborhood were manually removed. After performing this task iteratively, and rebuilding MSAs, phylogenetic trees, and SSNs as described above, 5022 UG/Abi RT bona fide representative sequences were retained.

Sequence and structure-based annotation of UG/Abi RTs domains

For every UG/Abi RT group, MSAs were built using MAFFT-einsi (28) with default parameters. Groups with bimodal length distribution were subdivided into small and large variants, and MSAs were built independently. Annotation of UG/Abi RTs was done using hhsearch and PFAM, COG, CDD and ECOD databases. We then performed structural predictions employing trRosetta (45) using previously built MSAs as input for modeling. After this, predicted models were compared against the PDB database using the DALI webserver (46). For αRep domains found in Class 1 UG/Abi RTs, motif boundaries were obtained from trRosetta using contact maps and predicted structure models as a reference. Then, trimmed MSAs were used as a query to perform further trRosetta structure predictions. A multiple protein structure alignment was built for every Class1 predicted repeated structure model, by using the mTM-align web server (47) and PyMOL Molecular Graphics System (Schrödinger, LLC) Cealign command with UG8 αRep domain as an anchor.

Taxonomy assignment

To determine the taxonomic distribution of every UG/Abi RT group, every representative sequence was queried against the NCBI taxonomy database (26), and information about the domain, phylum, class, order, family, and genus was retrieved for every associated genome (Supplementary Table S1). Then, relative abundances of phyla across the different UG/Abi RT groups were calculated and plotted using the ggplot2 R package (48). For every group, those phyla with 



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有